In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
print("Libraries successfully")
import warnings
warnings.filterwarnings("ignore")
Libraries successfully
In [2]:
dtypes = {
        'MachineIdentifier':                                    'category',
        'ProductName':                                          'category',
        'EngineVersion':                                        'category',
        'AppVersion':                                           'category',
        'AvSigVersion':                                         'category',
        'IsBeta':                                               'int8',
        'RtpStateBitfield':                                     'float16',
        'IsSxsPassiveMode':                                     'int8',
        'DefaultBrowsersIdentifier':                            'float16',
        'AVProductStatesIdentifier':                            'float32',
        'AVProductsInstalled':                                  'float16',
        'AVProductsEnabled':                                    'float16',
        'HasTpm':                                               'int8',
        'CountryIdentifier':                                    'int16',
        'CityIdentifier':                                       'float32',
        'OrganizationIdentifier':                               'float16',
        'GeoNameIdentifier':                                    'float16',
        'LocaleEnglishNameIdentifier':                          'int8',
        'Platform':                                             'category',
        'Processor':                                            'category',
        'OsVer':                                                'category',
        'OsBuild':                                              'int16',
        'OsSuite':                                              'int16',
        'OsPlatformSubRelease':                                 'category',
        'OsBuildLab':                                           'category',
        'SkuEdition':                                           'category',
        'IsProtected':                                          'float16',
        'AutoSampleOptIn':                                      'int8',
        'PuaMode':                                              'category',
        'SMode':                                                'float16',
        'IeVerIdentifier':                                      'float16',
        'SmartScreen':                                          'category',
        'Firewall':                                             'float16',
        'UacLuaenable':                                         'float32',
        'Census_MDC2FormFactor':                                'category',
        'Census_DeviceFamily':                                  'category',
        'Census_OEMNameIdentifier':                             'float16',
        'Census_OEMModelIdentifier':                            'float32',
        'Census_ProcessorCoreCount':                            'float16',
        'Census_ProcessorManufacturerIdentifier':               'float16',
        'Census_ProcessorModelIdentifier':                      'float16',
        'Census_ProcessorClass':                                'category',
        'Census_PrimaryDiskTotalCapacity':                      'float32',
        'Census_PrimaryDiskTypeName':                           'category',
        'Census_SystemVolumeTotalCapacity':                     'float32',
        'Census_HasOpticalDiskDrive':                           'int8',
        'Census_TotalPhysicalRAM':                              'float32',
        'Census_ChassisTypeName':                               'category',
        'Census_InternalPrimaryDiagonalDisplaySizeInInches':    'float16',
        'Census_InternalPrimaryDisplayResolutionHorizontal':    'float16',
        'Census_InternalPrimaryDisplayResolutionVertical':      'float16',
        'Census_PowerPlatformRoleName':                         'category',
        'Census_InternalBatteryType':                           'category',
        'Census_InternalBatteryNumberOfCharges':                'float32',
        'Census_OSVersion':                                     'category',
        'Census_OSArchitecture':                                'category',
        'Census_OSBranch':                                      'category',
        'Census_OSBuildNumber':                                 'int16',
        'Census_OSBuildRevision':                               'int32',
        'Census_OSEdition':                                     'category',
        'Census_OSSkuName':                                     'category',
        'Census_OSInstallTypeName':                             'category',
        'Census_OSInstallLanguageIdentifier':                   'float16',
        'Census_OSUILocaleIdentifier':                          'int16',
        'Census_OSWUAutoUpdateOptionsName':                     'category',
        'Census_IsPortableOperatingSystem':                     'int8',
        'Census_GenuineStateName':                              'category',
        'Census_ActivationChannel':                             'category',
        'Census_IsFlightingInternal':                           'float16',
        'Census_IsFlightsDisabled':                             'float16',
        'Census_FlightRing':                                    'category',
        'Census_ThresholdOptIn':                                'float16',
        'Census_FirmwareManufacturerIdentifier':                'float16',
        'Census_FirmwareVersionIdentifier':                     'float32',
        'Census_IsSecureBootEnabled':                           'int8',
        'Census_IsWIMBootEnabled':                              'float16',
        'Census_IsVirtualDevice':                               'float16',
        'Census_IsTouchEnabled':                                'int8',
        'Census_IsPenCapable':                                  'int8',
        'Census_IsAlwaysOnAlwaysConnectedCapable':              'float16',
        'Wdft_IsGamer':                                         'float16',
        'Wdft_RegionIdentifier':                                'float16',
        'HasDetections':                                        'int8'
        }
print("Data types added")
Data types added
In [3]:
df = pd.read_csv("../Data/train_clean.csv",dtype=dtypes)
df.head()
Out[3]:
Unnamed: 0 SmartScreen OrganizationIdentifier SMode Wdft_IsGamer Wdft_RegionIdentifier Census_InternalBatteryNumberOfCharges Census_FirmwareManufacturerIdentifier Census_IsFlightsDisabled Census_FirmwareVersionIdentifier ... OsPlatformSubRelease SkuEdition AutoSampleOptIn Census_MDC2FormFactor Census_DeviceFamily ProductName Census_HasOpticalDiskDrive Census_OSVersion Census_OSArchitecture HasDetections
0 0 requireadmin 18.0 0.0 0.0 10.0 4.294967e+09 628.0 0.0 36144.0 ... rs4 Pro 0 Desktop Windows.Desktop win8defender 0 10.0.17134.165 amd64 0
1 1 requireadmin 18.0 0.0 0.0 8.0 1.000000e+00 628.0 0.0 57858.0 ... rs4 Pro 0 Notebook Windows.Desktop win8defender 0 10.0.17134.1 amd64 0
2 2 requireadmin 18.0 0.0 0.0 3.0 4.294967e+09 142.0 0.0 52682.0 ... rs4 Home 0 Desktop Windows.Desktop win8defender 0 10.0.17134.165 amd64 0
3 3 existsnotset 27.0 0.0 0.0 3.0 4.294967e+09 355.0 0.0 20050.0 ... rs4 Pro 0 Desktop Windows.Desktop win8defender 0 10.0.17134.228 amd64 1
4 4 requireadmin 27.0 0.0 0.0 1.0 0.000000e+00 355.0 0.0 19844.0 ... rs4 Home 0 Notebook Windows.Desktop win8defender 0 10.0.17134.191 amd64 1

5 rows × 73 columns

In [356]:
df.drop(columns="Unnamed: 0",inplace=True)
df.head()
Out[356]:
SmartScreen OrganizationIdentifier SMode Wdft_IsGamer Wdft_RegionIdentifier Census_InternalBatteryNumberOfCharges Census_FirmwareManufacturerIdentifier Census_IsFlightsDisabled Census_FirmwareVersionIdentifier Census_OEMModelIdentifier ... OsPlatformSubRelease SkuEdition AutoSampleOptIn Census_MDC2FormFactor Census_DeviceFamily ProductName Census_HasOpticalDiskDrive Census_OSVersion Census_OSArchitecture HasDetections
0 requireadmin 18.0 0.0 0.0 10.0 4.294967e+09 628.0 0.0 36144.0 9124.0 ... rs4 Pro 0 Desktop Windows.Desktop win8defender 0 10.0.17134.165 amd64 0
1 requireadmin 18.0 0.0 0.0 8.0 1.000000e+00 628.0 0.0 57858.0 91656.0 ... rs4 Pro 0 Notebook Windows.Desktop win8defender 0 10.0.17134.1 amd64 0
2 requireadmin 18.0 0.0 0.0 3.0 4.294967e+09 142.0 0.0 52682.0 317701.0 ... rs4 Home 0 Desktop Windows.Desktop win8defender 0 10.0.17134.165 amd64 0
3 existsnotset 27.0 0.0 0.0 3.0 4.294967e+09 355.0 0.0 20050.0 275890.0 ... rs4 Pro 0 Desktop Windows.Desktop win8defender 0 10.0.17134.228 amd64 1
4 requireadmin 27.0 0.0 0.0 1.0 0.000000e+00 355.0 0.0 19844.0 331929.0 ... rs4 Home 0 Notebook Windows.Desktop win8defender 0 10.0.17134.191 amd64 1

5 rows × 72 columns

In [417]:
df.shape
Out[417]:
(7963858, 72)
In [418]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7963858 entries, 0 to 7963857
Data columns (total 72 columns):
 #   Column                                             Dtype   
---  ------                                             -----   
 0   SmartScreen                                        category
 1   OrganizationIdentifier                             float16 
 2   SMode                                              float16 
 3   Wdft_IsGamer                                       float16 
 4   Wdft_RegionIdentifier                              float16 
 5   Census_InternalBatteryNumberOfCharges              float32 
 6   Census_FirmwareManufacturerIdentifier              float16 
 7   Census_IsFlightsDisabled                           float16 
 8   Census_FirmwareVersionIdentifier                   float32 
 9   Census_OEMModelIdentifier                          float32 
 10  Census_OEMNameIdentifier                           float16 
 11  Firewall                                           float16 
 12  Census_TotalPhysicalRAM                            float32 
 13  Census_IsAlwaysOnAlwaysConnectedCapable            float16 
 14  Census_OSInstallLanguageIdentifier                 float16 
 15  IeVerIdentifier                                    float16 
 16  Census_PrimaryDiskTotalCapacity                    float32 
 17  Census_SystemVolumeTotalCapacity                   float32 
 18  Census_InternalPrimaryDiagonalDisplaySizeInInches  float16 
 19  Census_InternalPrimaryDisplayResolutionHorizontal  float16 
 20  Census_InternalPrimaryDisplayResolutionVertical    float16 
 21  Census_ProcessorModelIdentifier                    float16 
 22  Census_ProcessorManufacturerIdentifier             float16 
 23  Census_ProcessorCoreCount                          float16 
 24  AVProductStatesIdentifier                          float32 
 25  AVProductsInstalled                                float16 
 26  AVProductsEnabled                                  float16 
 27  IsProtected                                        float16 
 28  RtpStateBitfield                                   float16 
 29  Census_IsVirtualDevice                             float16 
 30  Census_PrimaryDiskTypeName                         category
 31  UacLuaenable                                       float32 
 32  Census_ChassisTypeName                             category
 33  GeoNameIdentifier                                  float16 
 34  Census_PowerPlatformRoleName                       category
 35  Census_OSInstallTypeName                           category
 36  Census_OSSkuName                                   category
 37  Census_OSWUAutoUpdateOptionsName                   category
 38  Census_OSUILocaleIdentifier                        int16   
 39  Census_IsPortableOperatingSystem                   int8    
 40  Census_IsPenCapable                                int8    
 41  Census_OSBuildRevision                             int32   
 42  Census_GenuineStateName                            category
 43  Census_ActivationChannel                           category
 44  Census_IsTouchEnabled                              int8    
 45  Census_FlightRing                                  category
 46  Census_IsSecureBootEnabled                         int8    
 47  Census_OSEdition                                   category
 48  Census_OSBuildNumber                               int16   
 49  OsVer                                              category
 50  EngineVersion                                      category
 51  AppVersion                                         category
 52  IsBeta                                             int8    
 53  IsSxsPassiveMode                                   int8    
 54  HasTpm                                             int8    
 55  CountryIdentifier                                  int16   
 56  LocaleEnglishNameIdentifier                        int8    
 57  Platform                                           category
 58  Processor                                          category
 59  OsBuild                                            int16   
 60  Census_OSBranch                                    category
 61  OsSuite                                            int16   
 62  OsPlatformSubRelease                               category
 63  SkuEdition                                         category
 64  AutoSampleOptIn                                    int8    
 65  Census_MDC2FormFactor                              category
 66  Census_DeviceFamily                                category
 67  ProductName                                        category
 68  Census_HasOpticalDiskDrive                         int8    
 69  Census_OSVersion                                   category
 70  Census_OSArchitecture                              category
 71  HasDetections                                      int8    
dtypes: category(24), float16(23), float32(8), int16(5), int32(1), int8(11)
memory usage: 972.2 MB
In [359]:
def listRemove(list_name,col):
    try:
        list_name.remove(col)
    except ValueError:
        print("Issue while removing column")
        pass
    return list_name
In [9]:
#!conda create -n jupyterlab-debugger -c conda-forge jupyterlab=3 xeus-python
#!conda activate jupyterlab-debugger
In [360]:
def columnsTypesSeggregation(df):
    boolean_cols,cat_cols,num_cols = [],[],[]
    for column in list(df.columns):
        if df[column].nunique() == 2.0:
            boolean_cols.append(column)
    print("No of Boolean Columns = "+str(len(boolean_cols)))
    rem_cols = list([elem for elem in list(df.columns) if elem not in boolean_cols ])
    #print("no of rem cols = "+str(len(rem_cols)))
    for col in rem_cols:
        if df[col].dtype in ['float16','int8','float32','int16']:
            num_cols.append(col)
        else :
            cat_cols.append(col)
    print("No of Category columns = "+str(len(cat_cols)))
    print("No of Number columns = "+str(len(num_cols)))
    return boolean_cols,cat_cols,num_cols
print("seggregation column ready to run")
seggregation column ready to run
In [361]:
boolean_cols,cat_cols,num_cols=columnsTypesSeggregation(df)
No of Boolean Columns = 17
No of Category columns = 25
No of Number columns = 30
In [362]:
boolean_cols.remove("HasDetections")
boolean_cols
Out[362]:
['SMode',
 'Wdft_IsGamer',
 'Census_IsFlightsDisabled',
 'Firewall',
 'Census_IsAlwaysOnAlwaysConnectedCapable',
 'IsProtected',
 'Census_IsVirtualDevice',
 'Census_IsPortableOperatingSystem',
 'Census_IsPenCapable',
 'Census_IsTouchEnabled',
 'Census_IsSecureBootEnabled',
 'IsBeta',
 'IsSxsPassiveMode',
 'HasTpm',
 'AutoSampleOptIn',
 'Census_HasOpticalDiskDrive']
In [363]:
cat_cols
Out[363]:
['SmartScreen',
 'Census_PrimaryDiskTypeName',
 'Census_ChassisTypeName',
 'Census_PowerPlatformRoleName',
 'Census_OSInstallTypeName',
 'Census_OSSkuName',
 'Census_OSWUAutoUpdateOptionsName',
 'Census_OSBuildRevision',
 'Census_GenuineStateName',
 'Census_ActivationChannel',
 'Census_FlightRing',
 'Census_OSEdition',
 'OsVer',
 'EngineVersion',
 'AppVersion',
 'Platform',
 'Processor',
 'Census_OSBranch',
 'OsPlatformSubRelease',
 'SkuEdition',
 'Census_MDC2FormFactor',
 'Census_DeviceFamily',
 'ProductName',
 'Census_OSVersion',
 'Census_OSArchitecture']
In [364]:
num_cols
Out[364]:
['OrganizationIdentifier',
 'Wdft_RegionIdentifier',
 'Census_InternalBatteryNumberOfCharges',
 'Census_FirmwareManufacturerIdentifier',
 'Census_FirmwareVersionIdentifier',
 'Census_OEMModelIdentifier',
 'Census_OEMNameIdentifier',
 'Census_TotalPhysicalRAM',
 'Census_OSInstallLanguageIdentifier',
 'IeVerIdentifier',
 'Census_PrimaryDiskTotalCapacity',
 'Census_SystemVolumeTotalCapacity',
 'Census_InternalPrimaryDiagonalDisplaySizeInInches',
 'Census_InternalPrimaryDisplayResolutionHorizontal',
 'Census_InternalPrimaryDisplayResolutionVertical',
 'Census_ProcessorModelIdentifier',
 'Census_ProcessorManufacturerIdentifier',
 'Census_ProcessorCoreCount',
 'AVProductStatesIdentifier',
 'AVProductsInstalled',
 'AVProductsEnabled',
 'RtpStateBitfield',
 'UacLuaenable',
 'GeoNameIdentifier',
 'Census_OSUILocaleIdentifier',
 'Census_OSBuildNumber',
 'CountryIdentifier',
 'LocaleEnglishNameIdentifier',
 'OsBuild',
 'OsSuite']

let's create a matrix for boolean columns by writing a function

In [424]:
def boolean_cols_eda(df,column,dependent="HasDetections"):
    eda_df = df[[column,dependent]].groupby([column,dependent]).size().unstack(fill_value=0)
    eda_df = pd.DataFrame({"No Malware detected":eda_df[0],"Malware Detected":eda_df[1]})
    return eda_df
In [425]:
#Let's make a function for all boolean to calculated this way.
print("\nBoolean columns Summary")
for col in boolean_cols:
    eda_df = boolean_cols_eda(df,col)
    print("\n\n "+col+" Vs HasDetections\n\n")
    colors = ['#2ca02c','#d62728']
    fig,ax = plt.subplots(nrows=1,ncols=2,figsize=(15,7))
    eda_df.iloc[0][["No Malware detected","Malware Detected"]].plot(kind='bar',ax=ax[0],color=colors)
    ax[0].set_title("When '"+col+"' is 0")
    eda_df.iloc[1][["No Malware detected","Malware Detected"]].plot(kind='bar',ax=ax[1],color=colors)
    ax[1].set_title("When '"+col+"' is 1")
    fig.tight_layout(pad=3.0)
    plt.show()
    eda_df["Total of each category"] = eda_df.sum(axis=1)
    eda_df["% of Malware detected"] = (eda_df["Malware Detected"]*100)/(eda_df["Malware Detected"]+eda_df["No Malware detected"])
    print(eda_df)
print("\nBoolean columns Summary")
Boolean columns Summary


 SMode Vs HasDetections


       No Malware detected  Malware Detected  Total of each category  \
SMode                                                                  
0.0                3970036           3990340                 7960376   
1.0                   2945               537                    3482   

       % of Malware detected  
SMode                         
0.0                50.127532  
1.0                15.422171  


 Wdft_IsGamer Vs HasDetections


              No Malware detected  Malware Detected  Total of each category  \
Wdft_IsGamer                                                                  
0.0                       2910136           2720547                 5630683   
1.0                       1062845           1270330                 2333175   

              % of Malware detected  
Wdft_IsGamer                         
0.0                       48.316465  
1.0                       54.446409  


 Census_IsFlightsDisabled Vs HasDetections


                          No Malware detected  Malware Detected  \
Census_IsFlightsDisabled                                          
0.0                                   3972905           3990873   
1.0                                        76                 4   

                          Total of each category  % of Malware detected  
Census_IsFlightsDisabled                                                 
0.0                                      7963778              50.112811  
1.0                                           80               5.000000  


 Firewall Vs HasDetections


          No Malware detected  Malware Detected  Total of each category  \
Firewall                                                                  
0.0                     82110             80266                  162376   
1.0                   3890871           3910611                 7801482   

          % of Malware detected  
Firewall                         
0.0                   49.432182  
1.0                   50.126514  


 Census_IsAlwaysOnAlwaysConnectedCapable Vs HasDetections


                                         No Malware detected  \
Census_IsAlwaysOnAlwaysConnectedCapable                        
0.0                                                  3683883   
1.0                                                   289098   

                                         Malware Detected  \
Census_IsAlwaysOnAlwaysConnectedCapable                     
0.0                                               3819341   
1.0                                                171536   

                                         Total of each category  \
Census_IsAlwaysOnAlwaysConnectedCapable                           
0.0                                                     7503224   
1.0                                                      460634   

                                         % of Malware detected  
Census_IsAlwaysOnAlwaysConnectedCapable                         
0.0                                                  50.902665  
1.0                                                  37.239110  


 IsProtected Vs HasDetections


             No Malware detected  Malware Detected  Total of each category  \
IsProtected                                                                  
0.0                       269344            161484                  430828   
1.0                      3703637           3829393                 7533030   

             % of Malware detected  
IsProtected                         
0.0                      37.482243  
1.0                      50.834697  


 Census_IsVirtualDevice Vs HasDetections


                        No Malware detected  Malware Detected  \
Census_IsVirtualDevice                                          
0.0                                 3953699           3986286   
1.0                                   19282              4591   

                        Total of each category  % of Malware detected  
Census_IsVirtualDevice                                                 
0.0                                    7939985              50.205208  
1.0                                      23873              19.230930  


 Census_IsPortableOperatingSystem Vs HasDetections


                                  No Malware detected  Malware Detected  \
Census_IsPortableOperatingSystem                                          
0                                             3971171           3988463   
1                                                1810              2414   

                                  Total of each category  \
Census_IsPortableOperatingSystem                           
0                                                7959634   
1                                                   4224   

                                  % of Malware detected  
Census_IsPortableOperatingSystem                         
0                                             50.108623  
1                                             57.149621  


 Census_IsPenCapable Vs HasDetections


                     No Malware detected  Malware Detected  \
Census_IsPenCapable                                          
0                                3800829           3845208   
1                                 172152            145669   

                     Total of each category  % of Malware detected  
Census_IsPenCapable                                                 
0                                   7646037              50.290209  
1                                    317821              45.833661  


 Census_IsTouchEnabled Vs HasDetections


                       No Malware detected  Malware Detected  \
Census_IsTouchEnabled                                          
0                                  3400424           3525387   
1                                   572557            465490   

                       Total of each category  % of Malware detected  
Census_IsTouchEnabled                                                 
0                                     6925811              50.902154  
1                                     1038047              44.842864  


 Census_IsSecureBootEnabled Vs HasDetections


                            No Malware detected  Malware Detected  \
Census_IsSecureBootEnabled                                          
0                                       1966520           1977413   
1                                       2006461           2013464   

                            Total of each category  % of Malware detected  
Census_IsSecureBootEnabled                                                 
0                                          3943933              50.138098  
1                                          4019925              50.087104  


 IsBeta Vs HasDetections


        No Malware detected  Malware Detected  Total of each category  \
IsBeta                                                                  
0                   3972971           3990865                 7963836   
1                        10                12                      22   

        % of Malware detected  
IsBeta                         
0                   50.112345  
1                   54.545455  


 IsSxsPassiveMode Vs HasDetections


                  No Malware detected  Malware Detected  \
IsSxsPassiveMode                                          
0                             3880644           3937212   
1                               92337             53665   

                  Total of each category  % of Malware detected  
IsSxsPassiveMode                                                 
0                                7817856              50.361787  
1                                 146002              36.756346  


 HasTpm Vs HasDetections


        No Malware detected  Malware Detected  Total of each category  \
HasTpm                                                                  
0                     20159             15641                   35800   
1                   3952822           3975236                 7928058   

        % of Malware detected  
HasTpm                         
0                   43.689944  
1                   50.141359  


 AutoSampleOptIn Vs HasDetections


                 No Malware detected  Malware Detected  \
AutoSampleOptIn                                          
0                            3972893           3990764   
1                                 88               113   

                 Total of each category  % of Malware detected  
AutoSampleOptIn                                                 
0                               7963657              50.112203  
1                                   201              56.218905  


 Census_HasOpticalDiskDrive Vs HasDetections


                            No Malware detected  Malware Detected  \
Census_HasOpticalDiskDrive                                          
0                                       3674165           3643160   
1                                        298816            347717   

                            Total of each category  % of Malware detected  
Census_HasOpticalDiskDrive                                                 
0                                          7317325              49.788140  
1                                           646533              53.781787  

Boolean columns Summary

Summary of Boolean columns

  • When SMode is on (1), the chances of getting infected by a malware is reduced by 35 % (50 to 15%)
  • When Census_IsFlightsDiabled then, only 5% is infected with malware. But total number of systems that had the feature enabled is only 80 out of 8 million, it shows this feature is known to very few and used only by a handful of people. The reason for less malware attack % might be because the feature may not be known to majority of the people.
  • Enabling Census_IsAlwaysOnAlwaysConnectedCapable reduces chance of getting affected by a malware by 13% (50 to 37%)
  • When system is not protected (IsProtected) the chance of being affected by malware is less by 13% when coompared to those systems that have protected switched. Ironically, this was the case but the scenario might be something like If a system is not equipped with malware protection software which detects some virus, one might not even be aware of all the virus that's present in one's system easily when compared to those which have malware protection software that checks for viruses and report once found.
  • Machines running on virtual devices(Census_IsVirtualDevice) have a percentage of only 19% of them being infected.

Category Columns

In [18]:
def cat_cols_eda(df,column,dependent="HasDetections"):
    eda_df = df[[column,dependent]].groupby([column,dependent]).size().unstack(fill_value=0)
    eda_df = pd.DataFrame({"No Malware detected":eda_df[0],"Malware Detected":eda_df[1]})
    return eda_df
In [21]:
for col in cat_cols:
    print("\n\n "+col+" Vs HasDetections\n\n")
    eda_df = cat_cols_eda(df,col)
    eda_df["Total of each category"] = eda_df.sum(axis=1)
    eda_df["% of Malware detected"] = (eda_df["Malware Detected"]*100)/(eda_df["Malware Detected"]+eda_df["No Malware detected"])
    eda_df=eda_df[eda_df["Total of each category"] > 100 ]
    eda_df.sort_values(by="% of Malware detected",ascending=False,inplace=True)
    fig,ax=plt.subplots(1,2,figsize=(20,8))
    temp1=eda_df[eda_df["% of Malware detected"] > 52.0]
    if temp1.shape[0] > 0:
        temp1[["No Malware detected","Malware Detected"]].plot(kind='bar',ax=ax[0])
        ax[0].set_title("Most Vulnerable categories in '"+col+"' - high percentages in malware detected")
    
    temp2=eda_df[eda_df["% of Malware detected"] < 48.0]
    if temp2.shape[0] > 0 :
        temp2[["No Malware detected","Malware Detected"]].plot(kind='bar',ax=ax[1])
        ax[1].set_title("Safest categories in '"+col+"' - lowest percentages in malware detected")
    plt.show()
    print(eda_df)
print("\nCategory Columns Summary")

 SmartScreen Vs HasDetections


              No Malware detected  Malware Detected  Total of each category  \
SmartScreen                                                                   
existsnotset               181909            779360                  961269   
&#x01;                         99               182                     281   
warn                        51934             73098                  125032   
on                            352               463                     815   
block                        9780             10936                   20716   
&#x02;                        182               199                     381   
off                         78217             79193                  157410   
prompt                      15254             14051                   29305   
requireadmin              3635253           3033390                 6668643   

              % of Malware detected  
SmartScreen                          
existsnotset              81.076161  
&#x01;                    64.768683  
warn                      58.463433  
on                        56.809816  
block                     52.790114  
&#x02;                    52.230971  
off                       50.310018  
prompt                    47.947449  
requireadmin              45.487365  


 Census_PrimaryDiskTypeName Vs HasDetections


                            No Malware detected  Malware Detected  \
Census_PrimaryDiskTypeName                                          
HDD                                     2627683           2691396   
SSD                                     1095585           1098060   
UNKNOWN                                  127370            105625   
Unspecified                              122343             95796   

                            Total of each category  % of Malware detected  
Census_PrimaryDiskTypeName                                                 
HDD                                        5319079              50.598910  
SSD                                        2193645              50.056413  
UNKNOWN                                     232995              45.333591  
Unspecified                                 218139              43.915118  


 Census_ChassisTypeName Vs HasDetections


                        No Malware detected  Malware Detected  \
Census_ChassisTypeName                                          
MiniTower                             34774             38290   
Desktop                              761380            838172   
AllinOne                              86470             92497   
LowProfileDesktop                     21877             22193   
Notebook                            2396715           2413692   
Tower                                  4548              4578   
Laptop                               314507            305806   
SpaceSaving                           12786             12353   
Convertible                           40873             38026   
Portable                             171545            157747   
UNKNOWN                               17225             13836   
LunchBox                               1750              1387   
Unknown                                2595              1700   
Tablet                                 7877              4871   
MainServerChassis                      4803              2896   
MiniPC                                 2532              1473   
BusExpansionChassis                     453               240   
StickPC                                  79                40   
SubNotebook                             483               242   
30                                      151                75   
Detachable                            32941             16247   
RackMountChassis                        489               233   
Other                                 26327             11869   
HandHeld                              29630             12294   

                        Total of each category  % of Malware detected  
Census_ChassisTypeName                                                 
MiniTower                                73064              52.406110  
Desktop                                1599552              52.400422  
AllinOne                                178967              51.683830  
LowProfileDesktop                        44070              50.358521  
Notebook                               4810407              50.176461  
Tower                                     9126              50.164366  
Laptop                                  620313              49.298661  
SpaceSaving                              25139              49.138788  
Convertible                              78899              48.195795  
Portable                                329292              47.904899  
UNKNOWN                                  31061              44.544606  
LunchBox                                  3137              44.214217  
Unknown                                   4295              39.580908  
Tablet                                   12748              38.209915  
MainServerChassis                         7699              37.615275  
MiniPC                                    4005              36.779026  
BusExpansionChassis                        693              34.632035  
StickPC                                    119              33.613445  
SubNotebook                                725              33.379310  
30                                         226              33.185841  
Detachable                               49188              33.030414  
RackMountChassis                           722              32.271468  
Other                                    38196              31.073934  
HandHeld                                 41924              29.324492  


 Census_PowerPlatformRoleName Vs HasDetections


                              No Malware detected  Malware Detected  \
Census_PowerPlatformRoleName                                          
Desktop                                    832506            928100   
Workstation                                 37624             40351   
Mobile                                    2797988           2839316   
SOHOServer                                  16054             14776   
UNKNOWN                                      2961              2508   
Slate                                      281661            164111   
EnterpriseServer                             1425               815   
AppliancePC                                  2729               878   

                              Total of each category  % of Malware detected  
Census_PowerPlatformRoleName                                                 
Desktop                                      1760606              52.714804  
Workstation                                    77975              51.748637  
Mobile                                       5637304              50.366558  
SOHOServer                                     30830              47.927343  
UNKNOWN                                         5469              45.858475  
Slate                                         445772              36.815009  
EnterpriseServer                                2240              36.383929  
AppliancePC                                     3607              24.341558  


 Census_OSInstallTypeName Vs HasDetections


                          No Malware detected  Malware Detected  \
Census_OSInstallTypeName                                          
UUPUpgrade                            1186408           1333940   
IBSClean                               615714            672032   
Clean                                   29970             32056   
Reset                                  289557            278699   
Other                                  387131            358619   
Update                                 732022            660853   
Upgrade                                617868            555903   
CleanPCRefresh                          23707             21074   
Refresh                                 90604             77701   

                          Total of each category  % of Malware detected  
Census_OSInstallTypeName                                                 
UUPUpgrade                               2520348              52.926818  
IBSClean                                 1287746              52.186689  
Clean                                      62026              51.681553  
Reset                                     568256              49.044621  
Other                                     745750              48.088367  
Update                                   1392875              47.445248  
Upgrade                                  1173771              47.360431  
CleanPCRefresh                             44781              47.060137  
Refresh                                   168305              46.166781  


 Census_OSSkuName Vs HasDetections


                      No Malware detected  Malware Detected  \
Census_OSSkuName                                              
ENTERPRISE_S_N                        298               501   
EDUCATION                           16923             20330   
ENTERPRISE_N                          136               156   
CORE_SINGLELANGUAGE                844046            953060   
ENTERPRISE_S                         7883              8892   
PROFESSIONAL_N                      11143             12423   
ENTERPRISE                          15071             16734   
PROFESSIONAL                      1347907           1414429   
CORE_COUNTRYSPECIFIC                72925             73623   
EDUCATION_N                           432               394   
CORE                              1650588           1486289   
CORE_N                               2165              1839   
CLOUD                                3400              2156   

                      Total of each category  % of Malware detected  
Census_OSSkuName                                                     
ENTERPRISE_S_N                           799              62.703379  
EDUCATION                              37253              54.572786  
ENTERPRISE_N                             292              53.424658  
CORE_SINGLELANGUAGE                  1797106              53.033043  
ENTERPRISE_S                           16775              53.007452  
PROFESSIONAL_N                         23566              52.715777  
ENTERPRISE                             31805              52.614369  
PROFESSIONAL                         2762336              51.204090  
CORE_COUNTRYSPECIFIC                  146548              50.238147  
EDUCATION_N                              826              47.699758  
CORE                                 3136877              47.381169  
CORE_N                                  4004              45.929071  
CLOUD                                   5556              38.804896  


 Census_OSWUAutoUpdateOptionsName Vs HasDetections


                                       No Malware detected  Malware Detected  \
Census_OSWUAutoUpdateOptionsName                                               
FullAuto                                           1703158           1851505   
UNKNOWN                                            1057791           1030045   
AutoInstallAndRebootAtMaintenanceTime               180679            173550   
Off                                                  12952             11951   
Notify                                             1018401            923826   

                                       Total of each category  \
Census_OSWUAutoUpdateOptionsName                                
FullAuto                                              3554663   
UNKNOWN                                               2087836   
AutoInstallAndRebootAtMaintenanceTime                  354229   
Off                                                     24903   
Notify                                                1942227   

                                       % of Malware detected  
Census_OSWUAutoUpdateOptionsName                              
FullAuto                                           52.086654  
UNKNOWN                                            49.335532  
AutoInstallAndRebootAtMaintenanceTime              48.993730  
Off                                                47.990202  
Notify                                             47.565295  


 Census_OSBuildRevision Vs HasDetections


                        No Malware detected  Malware Detected  \
Census_OSBuildRevision                                          
2396                                     76               186   
17914                                   518              1229   
17918                                   153               313   
17946                                  1181              1982   
17889                                   587               949   
...                                     ...               ...   
10                                      168                85   
1378                                     71                34   
2248                                   1016               398   
2312                                    353               120   
2273                                    113                29   

                        Total of each category  % of Malware detected  
Census_OSBuildRevision                                                 
2396                                       262              70.992366  
17914                                     1747              70.349170  
17918                                      466              67.167382  
17946                                     3163              62.662030  
17889                                     1536              61.783854  
...                                        ...                    ...  
10                                         253              33.596838  
1378                                       105              32.380952  
2248                                      1414              28.147100  
2312                                       473              25.369979  
2273                                       142              20.422535  

[202 rows x 4 columns]


 Census_GenuineStateName Vs HasDetections


                         No Malware detected  Malware Detected  \
Census_GenuineStateName                                          
OFFLINE                                52741             64949   
IS_GENUINE                           3584704           3599578   
INVALID_LICENSE                       330604            323486   
UNKNOWN                                 4931              2864   

                         Total of each category  % of Malware detected  
Census_GenuineStateName                                                 
OFFLINE                                  117690              55.186507  
IS_GENUINE                              7184282              50.103518  
INVALID_LICENSE                          654090              49.455885  
UNKNOWN                                    7795              36.741501  


 Census_ActivationChannel Vs HasDetections


                          No Malware detected  Malware Detected  \
Census_ActivationChannel                                          
Volume:GVLK                            151233            222190   
OEM:NONSLP                             140192            156620   
OEM:DM                                1591592           1591293   
Retail                                2086138           2017483   
Volume:MAK                               3826              3291   

                          Total of each category  % of Malware detected  
Census_ActivationChannel                                                 
Volume:GVLK                               373423              59.500888  
OEM:NONSLP                                296812              52.767408  
OEM:DM                                   3182885              49.995303  
Retail                                   4103621              49.163483  
Volume:MAK                                  7117              46.241394  


 Census_FlightRing Vs HasDetections


                   No Malware detected  Malware Detected  \
Census_FlightRing                                          
Disabled                          1540              1661   
NOT_SET                         120888            121895   
Retail                         3730216           3760350   
RP                                4497              4497   
Unknown                         105406             94075   
WIF                               4956              4247   
WIS                               5469              4151   

                   Total of each category  % of Malware detected  
Census_FlightRing                                                 
Disabled                             3201              51.890034  
NOT_SET                            242783              50.207387  
Retail                            7490566              50.201146  
RP                                   8994              50.000000  
Unknown                            199481              47.159880  
WIF                                  9203              46.147995  
WIS                                  9620              43.149688  


 Census_OSEdition Vs HasDetections


                        No Malware detected  Malware Detected  \
Census_OSEdition                                                
EnterpriseSN                            298               499   
ProfessionalEducation                 24450             29974   
Education                             16897             20302   
EnterpriseN                             133               157   
CoreSingleLanguage                   844126            953201   
EnterpriseS                            7880              8892   
ProfessionalN                         11067             12348   
Enterprise                            15063             16737   
Professional                        1323280           1384212   
ProfessionalEducationN                   77                79   
CoreCountrySpecific                   72980             73705   
EducationN                              433               397   
Core                                1650625           1486300   
CoreN                                  2167              1839   
Cloud                                  3442              2188   

                        Total of each category  % of Malware detected  
Census_OSEdition                                                       
EnterpriseSN                               797              62.609787  
ProfessionalEducation                    54424              55.074967  
Education                                37199              54.576736  
EnterpriseN                                290              54.137931  
CoreSingleLanguage                     1797327              53.034367  
EnterpriseS                              16772              53.016933  
ProfessionalN                            23415              52.735426  
Enterprise                               31800              52.632075  
Professional                           2707492              51.125248  
ProfessionalEducationN                     156              50.641026  
CoreCountrySpecific                     146685              50.247128  
EducationN                                 830              47.831325  
Core                                   3136925              47.380795  
CoreN                                     4006              45.906141  
Cloud                                     5630              38.863233  


 OsVer Vs HasDetections


          No Malware detected  Malware Detected  Total of each category  \
OsVer                                                                     
10.0.3.0                   81               121                     202   
10.0.1.0                   55                71                     126   
6.1.0.0                   115               128                     243   
6.3.0.0                 50363             50744                  101107   
10.0.0.0              3906923           3927512                 7834435   
6.1.1.0                 15351             12196                   27547   

          % of Malware detected  
OsVer                            
10.0.3.0              59.900990  
10.0.1.0              56.349206  
6.1.0.0               52.674897  
6.3.0.0               50.188414  
10.0.0.0              50.131401  
6.1.1.0               44.273424  


 EngineVersion Vs HasDetections


               No Malware detected  Malware Detected  Total of each category  \
EngineVersion                                                                  
1.1.15100.1                1487972           1867048                 3355020   
1.1.15300.5                  27761             29866                   57627   
1.1.15200.1                1733478           1677284                 3410762   
1.1.15300.6                  54913             51387                  106300   
1.1.13504.0                  33980             24390                   58370   
1.1.14104.0                  40336             28686                   69022   
1.1.14305.0                   2590              1807                    4397   
1.1.14600.4                  64544             43025                  107569   
1.1.14202.0                   7964              5303                   13267   
1.1.14700.4                    606               398                    1004   
1.1.14500.5                  25059             16118                   41177   
1.1.14306.0                  12975              8141                   21116   
1.1.14405.2                  19970             12383                   32353   
1.1.14003.0                   7762              4738                   12500   
1.1.13903.0                   5240              3101                    8341   
1.1.13804.0                   5151              3032                    8183   
1.1.14700.3                    763               448                    1211   
1.1.12805.0                    964               566                    1530   
1.1.13701.0                   2728              1541                    4269   
1.1.13103.0                   2273              1276                    3549   
1.1.13704.0                   2683              1486                    4169   
1.1.12902.0                   2659              1436                    4095   
1.1.13407.0                   4844              2600                    7444   
1.1.13202.0                   2499              1340                    3839   
1.1.15000.2                 158781             84537                  243318   
1.1.13000.0                   2110              1090                    3200   
1.1.13601.0                   4086              2106                    6192   
1.1.13303.0                   5126              2541                    7667   
1.1.14700.5                  28508             13804                   42312   
1.1.14800.3                  85274             38403                  123677   
1.1.14303.0                    166                73                     239   
1.1.14901.4                 135249             59168                  194417   
1.1.14901.3                   1022               442                    1464   
1.1.14201.0                    142                61                     203   
1.1.15000.1                   1595               677                    2272   
1.1.14800.1                    612               259                     871   
1.1.14500.2                    182                74                     256   

               % of Malware detected  
EngineVersion                         
1.1.15100.1                55.649385  
1.1.15300.5                51.826401  
1.1.15200.1                49.176225  
1.1.15300.6                48.341486  
1.1.13504.0                41.785164  
1.1.14104.0                41.560662  
1.1.14305.0                41.096202  
1.1.14600.4                39.997583  
1.1.14202.0                39.971358  
1.1.14700.4                39.641434  
1.1.14500.5                39.143211  
1.1.14306.0                38.553703  
1.1.14405.2                38.274658  
1.1.14003.0                37.904000  
1.1.13903.0                37.177796  
1.1.13804.0                37.052426  
1.1.14700.3                36.994220  
1.1.12805.0                36.993464  
1.1.13701.0                36.097447  
1.1.13103.0                35.953790  
1.1.13704.0                35.644039  
1.1.12902.0                35.067155  
1.1.13407.0                34.927458  
1.1.13202.0                34.904923  
1.1.15000.2                34.743422  
1.1.13000.0                34.062500  
1.1.13601.0                34.011628  
1.1.13303.0                33.142037  
1.1.14700.5                32.624315  
1.1.14800.3                31.051044  
1.1.14303.0                30.543933  
1.1.14901.4                30.433553  
1.1.14901.3                30.191257  
1.1.14201.0                30.049261  
1.1.15000.1                29.797535  
1.1.14800.1                29.735936  
1.1.14500.2                28.906250  


 AppVersion Vs HasDetections


                  No Malware detected  Malware Detected  \
AppVersion                                                
4.8.10240.17914                   507              1124   
4.10.14393.2273                   174               352   
4.8.10240.17918                   156               283   
4.10.14393.726                     64               107   
4.8.10240.17889                   559               901   
...                               ...               ...   
4.16.17656.18052               150660             73309   
4.14.17613.18038                  656               317   
4.14.17613.18039                34207             16050   
4.14.17639.18041               125673             58938   
4.17.17672.1000                    77                26   

                  Total of each category  % of Malware detected  
AppVersion                                                       
4.8.10240.17914                     1631              68.914776  
4.10.14393.2273                      526              66.920152  
4.8.10240.17918                      439              64.464692  
4.10.14393.726                       171              62.573099  
4.8.10240.17889                     1460              61.712329  
...                                  ...                    ...  
4.16.17656.18052                  223969              32.731762  
4.14.17613.18038                     973              32.579651  
4.14.17613.18039                   50257              31.935850  
4.14.17639.18041                  184611              31.925508  
4.17.17672.1000                      103              25.242718  

[69 rows x 4 columns]


 Platform Vs HasDetections


           No Malware detected  Malware Detected  Total of each category  \
Platform                                                                   
windows8                 50375             50757                  101132   
windows10              3907139           3927787                 7834926   
windows7                 15467             12333                   27800   

           % of Malware detected  
Platform                          
windows8               50.188862  
windows10              50.131769  
windows7               44.363309  


 Processor Vs HasDetections


           No Malware detected  Malware Detected  Total of each category  \
Processor                                                                  
x64                    3539121           3726969                 7266090   
x86                     433547            263904                  697451   
arm64                      313                 4                     317   

           % of Malware detected  
Processor                         
x64                    51.292635  
x86                    37.838357  
arm64                   1.261830  


 Census_OSBranch Vs HasDetections


                           No Malware detected  Malware Detected  \
Census_OSBranch                                                    
rs4_release                            1734405           1931166   
rs3_release_svc_escrow                  550037            593829   
th1_st1                                  89404             93055   
rs_prerelease_flt                         1317              1211   
rs2_release                             383590            347373   
th2_release                              60255             52960   
rs3_release                             607838            524854   
rs1_release                             358607            298787   
th2_release_sec                         135494            107603   
rs5_release                               7611              6043   
th1                                      39560             30687   
rs3_release_svc_escrow_im                 3088              2246   
rs_prerelease                             1727              1060   

                           Total of each category  % of Malware detected  
Census_OSBranch                                                           
rs4_release                               3665571              52.683907  
rs3_release_svc_escrow                    1143866              51.914210  
th1_st1                                    182459              51.000499  
rs_prerelease_flt                            2528              47.903481  
rs2_release                                730963              47.522652  
th2_release                                113215              46.778254  
rs3_release                               1132692              46.336868  
rs1_release                                657394              45.450217  
th2_release_sec                            243097              44.263401  
rs5_release                                 13654              44.258093  
th1                                         70247              43.684428  
rs3_release_svc_escrow_im                    5334              42.107237  
rs_prerelease                                2787              38.033728  


 OsPlatformSubRelease Vs HasDetections


                      No Malware detected  Malware Detected  \
OsPlatformSubRelease                                          
rs4                               1700018           1884120   
windows8.1                          50375             50757   
rs3                               1180905           1149765   
th1                                128004            123102   
rs2                                376748            341319   
rs1                                339664            282453   
th2                                171161            138828   
windows7                            15467             12333   
prers5                              10639              8200   

                      Total of each category  % of Malware detected  
OsPlatformSubRelease                                                 
rs4                                  3584138              52.568288  
windows8.1                            101132              50.188862  
rs3                                  2330670              49.331952  
th1                                   251106              49.023918  
rs2                                   718067              47.533030  
rs1                                   622117              45.401910  
th2                                   309989              44.784815  
windows7                               27800              44.363309  
prers5                                 18839              43.526726  


 SkuEdition Vs HasDetections


                 No Malware detected  Malware Detected  \
SkuEdition                                               
Education                      17271             20417   
Enterprise LTSB                 8151              9382   
Enterprise                     14907             16470   
Pro                          1361689           1433203   
Home                         2555119           2499343   
Invalid                        12637             10164   
Cloud                           3207              1898   

                 Total of each category  % of Malware detected  
SkuEdition                                                      
Education                         37688              54.173742  
Enterprise LTSB                   17533              53.510523  
Enterprise                        31377              52.490678  
Pro                             2794892              51.279370  
Home                            5054462              49.448250  
Invalid                           22801              44.576992  
Cloud                              5105              37.179236  


 Census_MDC2FormFactor Vs HasDetections


                       No Malware detected  Malware Detected  \
Census_MDC2FormFactor                                          
Desktop                             785872            869810   
AllInOne                            120960            127194   
Notebook                           2584090           2627846   
Convertible                         192775            191814   
PCOther                              49354             43908   
Detachable                          176219            106870   
SmallServer                            602               354   
MediumServer                           300               135   
LargeTablet                          40009             17727   
SmallTablet                          22794              5217   

                       Total of each category  % of Malware detected  
Census_MDC2FormFactor                                                 
Desktop                               1655682              52.534847  
AllInOne                               248154              51.256075  
Notebook                              5211936              50.419767  
Convertible                            384589              49.875061  
PCOther                                 93262              47.080268  
Detachable                             283089              37.751379  
SmallServer                               956              37.029289  
MediumServer                              435              31.034483  
LargeTablet                             57736              30.703547  
SmallTablet                             28011              18.624826  


 Census_DeviceFamily Vs HasDetections


                     No Malware detected  Malware Detected  \
Census_DeviceFamily                                          
Windows.Desktop                  3972974           3990872   

                     Total of each category  % of Malware detected  
Census_DeviceFamily                                                 
Windows.Desktop                     7963846               50.11237  


 ProductName Vs HasDetections


              No Malware detected  Malware Detected  Total of each category  \
ProductName                                                                   
win8defender              3956889           3978286                 7935175   
mse                         16080             12580                   28660   

              % of Malware detected  
ProductName                          
win8defender              50.134824  
mse                       43.893929  


 Census_OSVersion Vs HasDetections


                  No Malware detected  Malware Detected  \
Census_OSVersion                                          
10.0.14393.2396                    76               186   
10.0.10240.17914                  518              1229   
10.0.10240.17918                  153               313   
10.0.10240.17946                 1181              1982   
10.0.10240.17889                  587               949   
...                               ...               ...   
10.0.14393.1378                    71                34   
10.0.17672.1000                   225                98   
10.0.14393.2248                  1016               398   
10.0.14393.2312                   353               120   
10.0.14393.2273                   113                29   

                  Total of each category  % of Malware detected  
Census_OSVersion                                                 
10.0.14393.2396                      262              70.992366  
10.0.10240.17914                    1747              70.349170  
10.0.10240.17918                     466              67.167382  
10.0.10240.17946                    3163              62.662030  
10.0.10240.17889                    1536              61.783854  
...                                  ...                    ...  
10.0.14393.1378                      105              32.380952  
10.0.17672.1000                      323              30.340557  
10.0.14393.2248                     1414              28.147100  
10.0.14393.2312                      473              25.369979  
10.0.14393.2273                      142              20.422535  

[227 rows x 4 columns]


 Census_OSArchitecture Vs HasDetections


                       No Malware detected  Malware Detected  \
Census_OSArchitecture                                          
amd64                              3539626           3726963   
x86                                 433042            263910   
arm64                                  313                 4   

                       Total of each category  % of Malware detected  
Census_OSArchitecture                                                 
amd64                                 7266589              51.289030  
x86                                    696952              37.866309  
arm64                                     317               1.261830  

Category Columns Summary

Summary of Catergory Columns

  • In feature Smart Screen if the status is existsnotset, the chance of a system being infected by a malware is as high as 81%. Followed by warn status with 58%.
  • System in power platform with role names Slate, EnterpriseServer and AppliancePC have less chance of vulnerability.
  • OSBuildRevision versions 2396,17914,17918,17946 and 17889 are most vulnerable for a malware affect and versions 2273,2312,2248,1378 are safest categories from malware.
  • Engine Version 1.1.14901 has considerable amount of people installed it which a malware detection percent of 30.
  • App versions 4.14.17639.18041 and 4.16.17656.18052 are safest categories.
  • Devices (Census_MDC2FormFactor) like Tablets,Servers and Detachables are less prone to malware when compared to Desktops,Notebooks and Convertibles.

Numeric columns

In [370]:
num_cols
Out[370]:
['OrganizationIdentifier',
 'Wdft_RegionIdentifier',
 'Census_InternalBatteryNumberOfCharges',
 'Census_FirmwareManufacturerIdentifier',
 'Census_FirmwareVersionIdentifier',
 'Census_OEMModelIdentifier',
 'Census_OEMNameIdentifier',
 'Census_TotalPhysicalRAM',
 'Census_OSInstallLanguageIdentifier',
 'IeVerIdentifier',
 'Census_PrimaryDiskTotalCapacity',
 'Census_SystemVolumeTotalCapacity',
 'Census_InternalPrimaryDiagonalDisplaySizeInInches',
 'Census_InternalPrimaryDisplayResolutionHorizontal',
 'Census_InternalPrimaryDisplayResolutionVertical',
 'Census_ProcessorModelIdentifier',
 'Census_ProcessorManufacturerIdentifier',
 'Census_ProcessorCoreCount',
 'AVProductStatesIdentifier',
 'AVProductsInstalled',
 'AVProductsEnabled',
 'RtpStateBitfield',
 'UacLuaenable',
 'GeoNameIdentifier',
 'Census_OSUILocaleIdentifier',
 'Census_OSBuildNumber',
 'CountryIdentifier',
 'LocaleEnglishNameIdentifier',
 'OsBuild',
 'OsSuite']
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [371]:
data_temp = df.sample(20000)
for col in num_cols:
    fig,ax = plt.subplots(1,2,figsize=(10,7))
    sns.boxplot(y=col,x='HasDetections',data=data_temp,ax=ax[0])
    sns.kdeplot(x=data_temp[data_temp.HasDetections==1][col],ax=ax[1])
    sns.kdeplot(x=data_temp[data_temp.HasDetections==0][col],ax=ax[1])
    ax[1].legend(("Malware Detected","No Malware Detected"),loc ='best')
    plt.show()
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
In [ ]:
 
In [ ]:
 
In [374]:
for i,col in enumerate(num_cols):
    sns.scatterplot(x=col,y=num_cols[i-1],hue="HasDetections",data=df.sample(10000))
    #plt.axis((np.mean(col)-3.0*np.mean(col)),(np.mean(col)+3.0*np.mean(col)),(np.mean(num_cols)-3.0*np.mean(num_cols)),(np.mean(num_cols)+3.0*np.mean(num_cols)))
    plt.show()
print("Bi variant numeric columns")
Bi variant numeric columns

There are few columns despite seggregation of column types are present in numeric eventhough they are categorical columns. let's move them to category columns and run category functions to get insights.

In [1]:
new_cat_columns = ['OsSuite','OrganizationIdentifier','Wdft_RegionIdentifier',
                  'Census_FirmwareManufacturerIdentifier','Census_ProcessorManufacturerIdentifier','Census_ProcessorCoreCount',
                  'AVProductsInstalled','AVProductsEnabled','RtpStateBitfield','UacLuaenable','Census_OSBuildNumber','OsBuild','OsSuite']
new_cat_columns
Out[1]:
['OsSuite',
 'OrganizationIdentifier',
 'Wdft_RegionIdentifier',
 'Census_FirmwareManufacturerIdentifier',
 'Census_ProcessorManufacturerIdentifier',
 'Census_ProcessorCoreCount',
 'AVProductsInstalled',
 'AVProductsEnabled',
 'RtpStateBitfield',
 'UacLuaenable',
 'Census_OSBuildNumber',
 'OsBuild',
 'OsSuite']
In [380]:
for col in new_cat_columns:
    print("\n\n "+col+" Vs HasDetections\n\n")
    eda_df = cat_cols_eda(df,col)
    eda_df["Total of each category"] = eda_df.sum(axis=1)
    eda_df["% of Malware detected"] = (eda_df["Malware Detected"]*100)/(eda_df["Malware Detected"]+eda_df["No Malware detected"])
    eda_df=eda_df[eda_df["Total of each category"] > 100 ]
    eda_df.sort_values(by="% of Malware detected",ascending=False,inplace=True)
    fig,ax=plt.subplots(1,2,figsize=(20,8))
    temp1=eda_df[eda_df["% of Malware detected"] > 52.0]
    if temp1.shape[0] > 0:
        temp1[["No Malware detected","Malware Detected"]].plot(kind='bar',ax=ax[0])
        ax[0].set_title("Most Vulnerable categories in '"+col+"' - high percentages in malware detected")
    
    temp2=eda_df[eda_df["% of Malware detected"] < 48.0]
    if temp2.shape[0] > 0 :
        temp2[["No Malware detected","Malware Detected"]].plot(kind='bar',ax=ax[1])
        ax[1].set_title("Safest categories in '"+col+"' - lowest percentages in malware detected")
    plt.show()
    print(eda_df)

 OsSuite Vs HasDetections


         No Malware detected  Malware Detected  Total of each category  \
OsSuite                                                                  
256                  1408903           1485487                 2894390   
768                  2563978           2505312                 5069290   
784                      100                77                     177   

         % of Malware detected  
OsSuite                         
256                  51.322973  
768                  49.421359  
784                  43.502825  


 OrganizationIdentifier Vs HasDetections


                        No Malware detected  Malware Detected  \
OrganizationIdentifier                                          
50.0                                  15611             23846   
20.0                                    306               451   
8.0                                     260               358   
36.0                                   1611              1995   
44.0                                     61                70   
48.0                                  26213             29506   
32.0                                   1632              1793   
22.0                                    166               182   
42.0                                     54                59   
46.0                                   4614              5008   
11.0                                   8193              8870   
18.0                                 772458            793279   
28.0                                    685               691   
27.0                                3113195           3101807   
29.0                                     55                54   
52.0                                   1352              1326   
47.0                                    179               174   
1.0                                     412               371   
37.0                                   9032              7941   
49.0                                   6567              5632   
39.0                                    198               166   
51.0                                    435               361   
5.0                                     966               799   
4.0                                     684               551   
31.0                                    191               152   
40.0                                    820               647   
10.0                                    545               416   
16.0                                    129                95   
3.0                                     176               124   
14.0                                   2530              1756   
33.0                                   1536              1063   
2.0                                    1264               862   
19.0                                     99                55   
6.0                                     233               113   
21.0                                    193                85   
26.0                                    100                44   

                        Total of each category  % of Malware detected  
OrganizationIdentifier                                                 
50.0                                     39457              60.435411  
20.0                                       757              59.577279  
8.0                                        618              57.928803  
36.0                                      3606              55.324459  
44.0                                       131              53.435115  
48.0                                     55719              52.955006  
32.0                                      3425              52.350365  
22.0                                       348              52.298851  
42.0                                       113              52.212389  
46.0                                      9622              52.047391  
11.0                                     17063              51.983825  
18.0                                   1565737              50.664895  
28.0                                      1376              50.218023  
27.0                                   6215002              49.908383  
29.0                                       109              49.541284  
52.0                                      2678              49.514563  
47.0                                       353              49.291785  
1.0                                        783              47.381865  
37.0                                     16973              46.786072  
49.0                                     12199              46.167719  
39.0                                       364              45.604396  
51.0                                       796              45.351759  
5.0                                       1765              45.269122  
4.0                                       1235              44.615385  
31.0                                       343              44.314869  
40.0                                      1467              44.103613  
10.0                                       961              43.288241  
16.0                                       224              42.410714  
3.0                                        300              41.333333  
14.0                                      4286              40.970602  
33.0                                      2599              40.900346  
2.0                                       2126              40.545626  
19.0                                       154              35.714286  
6.0                                        346              32.658960  
21.0                                       278              30.575540  
26.0                                       144              30.555556  


 Wdft_RegionIdentifier Vs HasDetections


                       No Malware detected  Malware Detected  \
Wdft_RegionIdentifier                                          
14.0                                  1485              1791   
1.0                                 532368            619146   
10.0                                816797            855529   
11.0                                620460            631042   
7.0                                 267226            271084   
3.0                                 607024            594055   
8.0                                 128645            125219   
5.0                                  96489             89986   
6.0                                  74324             68370   
13.0                                104485             95913   
4.0                                  64418             58726   
12.0                                 79446             71266   
15.0                                502171            442847   
2.0                                  39202             33326   
9.0                                  38441             32577   

                       Total of each category  % of Malware detected  
Wdft_RegionIdentifier                                                 
14.0                                     3276              54.670330  
1.0                                   1151514              53.767996  
10.0                                  1672326              51.158028  
11.0                                  1251502              50.422772  
7.0                                    538310              50.358344  
3.0                                   1201079              49.460110  
8.0                                    253864              49.325229  
5.0                                    186475              48.256335  
6.0                                    142694              47.913717  
13.0                                   200398              47.861256  
4.0                                    123144              47.688885  
12.0                                   150712              47.286215  
15.0                                   945018              46.861224  
2.0                                     72528              45.949151  
9.0                                     71018              45.871469  


 Census_InternalBatteryNumberOfCharges Vs HasDetections


                                       No Malware detected  Malware Detected  \
Census_InternalBatteryNumberOfCharges                                          
573.0                                                   46               102   
621.0                                                   42                78   
645.0                                                   46                83   
591.0                                                   51                88   
568.0                                                   61               102   
...                                                    ...               ...   
768.0                                                   95                56   
12425.0                                                120                68   
256.0                                                 6203              3068   
65535.0                                                 95                46   
20713.0                                                 91                42   

                                       Total of each category  \
Census_InternalBatteryNumberOfCharges                           
573.0                                                     148   
621.0                                                     120   
645.0                                                     129   
591.0                                                     139   
568.0                                                     163   
...                                                       ...   
768.0                                                     151   
12425.0                                                   188   
256.0                                                    9271   
65535.0                                                   141   
20713.0                                                   133   

                                       % of Malware detected  
Census_InternalBatteryNumberOfCharges                         
573.0                                              68.918919  
621.0                                              65.000000  
645.0                                              64.341085  
591.0                                              63.309353  
568.0                                              62.576687  
...                                                      ...  
768.0                                              37.086093  
12425.0                                            36.170213  
256.0                                              33.092439  
65535.0                                            32.624113  
20713.0                                            31.578947  

[687 rows x 4 columns]


 Census_FirmwareManufacturerIdentifier Vs HasDetections


                                       No Malware detected  Malware Detected  \
Census_FirmwareManufacturerIdentifier                                          
658.0                                                   63               128   
687.0                                                   52               101   
301.0                                                  133               229   
1014.0                                                 218               360   
892.0                                                  192               264   
...                                                    ...               ...   
357.0                                                  441                81   
869.0                                                 1354               215   
1040.0                                                 313                49   
182.0                                                  124                12   
1038.0                                                 108                10   

                                       Total of each category  \
Census_FirmwareManufacturerIdentifier                           
658.0                                                     191   
687.0                                                     153   
301.0                                                     362   
1014.0                                                    578   
892.0                                                     456   
...                                                       ...   
357.0                                                     522   
869.0                                                    1569   
1040.0                                                    362   
182.0                                                     136   
1038.0                                                    118   

                                       % of Malware detected  
Census_FirmwareManufacturerIdentifier                         
658.0                                              67.015707  
687.0                                              66.013072  
301.0                                              63.259669  
1014.0                                             62.283737  
892.0                                              57.894737  
...                                                      ...  
357.0                                              15.517241  
869.0                                              13.702996  
1040.0                                             13.535912  
182.0                                               8.823529  
1038.0                                              8.474576  

[114 rows x 4 columns]


 Census_ProcessorManufacturerIdentifier Vs HasDetections


                                        No Malware detected  Malware Detected  \
Census_ProcessorManufacturerIdentifier                                          
5.0                                                 3493194           3532152   
1.0                                                  479345            458680   
3.0                                                     128                41   
10.0                                                    312                 4   

                                        Total of each category  \
Census_ProcessorManufacturerIdentifier                           
5.0                                                    7025346   
1.0                                                     938025   
3.0                                                        169   
10.0                                                       316   

                                        % of Malware detected  
Census_ProcessorManufacturerIdentifier                         
5.0                                                 50.277267  
1.0                                                 48.898484  
3.0                                                 24.260355  
10.0                                                 1.265823  


 Census_ProcessorCoreCount Vs HasDetections


                           No Malware detected  Malware Detected  \
Census_ProcessorCoreCount                                          
12.0                                     31885             45626   
16.0                                      6065              8563   
6.0                                      25541             33885   
8.0                                     336731            427307   
32.0                                       632               757   
20.0                                       580               674   
24.0                                       595               658   
4.0                                    2419226           2506135   
40.0                                       135               138   
5.0                                         83                79   
28.0                                       111               105   
48.0                                        65                58   
36.0                                       129               113   
3.0                                       5808              5067   
2.0                                    1115617            947003   
1.0                                      29573             14551   

                           Total of each category  % of Malware detected  
Census_ProcessorCoreCount                                                 
12.0                                        77511              58.863903  
16.0                                        14628              58.538419  
6.0                                         59426              57.020496  
8.0                                        764038              55.927454  
32.0                                         1389              54.499640  
20.0                                         1254              53.748006  
24.0                                         1253              52.513966  
4.0                                       4925361              50.882260  
40.0                                          273              50.549451  
5.0                                           162              48.765432  
28.0                                          216              48.611111  
48.0                                          123              47.154472  
36.0                                          242              46.694215  
3.0                                         10875              46.593103  
2.0                                       2062620              45.912626  
1.0                                         44124              32.977518  


 AVProductsInstalled Vs HasDetections


                     No Malware detected  Malware Detected  \
AVProductsInstalled                                          
1.0                              2434326           3012749   
2.0                              1391690            918377   
3.0                               140679             57454   
4.0                                 5956              2181   
5.0                                  310               112   

                     Total of each category  % of Malware detected  
AVProductsInstalled                                                 
1.0                                 5447075              55.309483  
2.0                                 2310067              39.755427  
3.0                                  198133              28.997693  
4.0                                    8137              26.803490  
5.0                                     422              26.540284  


 AVProductsEnabled Vs HasDetections


                   No Malware detected  Malware Detected  \
AVProductsEnabled                                          
1.0                            3844063           3930074   
4.0                                256               130   
0.0                              13318              6536   
2.0                             111867             52628   
3.0                               3461              1508   

                   Total of each category  % of Malware detected  
AVProductsEnabled                                                 
1.0                               7774137              50.553187  
4.0                                   386              33.678756  
0.0                                 19854              32.920318  
2.0                                164495              31.993678  
3.0                                  4969              30.348159  


 RtpStateBitfield Vs HasDetections


                  No Malware detected  Malware Detected  \
RtpStateBitfield                                          
8.0                              3948             13052   
7.0                           3839962           3908389   
1.0                               849               569   
0.0                            109911             65183   
3.0                              1953               789   
5.0                             16356              2876   

                  Total of each category  % of Malware detected  
RtpStateBitfield                                                 
8.0                                17000              76.776471  
7.0                              7748351              50.441558  
1.0                                 1418              40.126939  
0.0                               175094              37.227432  
3.0                                 2742              28.774617  
5.0                                19232              14.954243  


 UacLuaenable Vs HasDetections


              No Malware detected  Malware Detected  Total of each category  \
UacLuaenable                                                                  
0.0                         18881             19573                   38454   
48.0                           89                92                     181   
1.0                       3953971           3971187                 7925158   

              % of Malware detected  
UacLuaenable                         
0.0                       50.899776  
48.0                      50.828729  
1.0                       50.108616  


 Census_OSBuildNumber Vs HasDetections


                      No Malware detected  Malware Detected  \
Census_OSBuildNumber                                          
17133                                  53                70   
17134                             1734343           1931085   
17733                                 221               215   
16299                             1160957           1120922   
10240                              128964            123742   
18242                                  59                56   
17755                                 307               285   
17746                                 573               524   
15063                              383586            347365   
17763                                 491               431   
17741                                 396               344   
17754                                 497               430   
17735                                 463               396   
17692                                1558              1327   
17758                                 815               688   
17744                                1164               975   
17760                                 286               239   
14393                              358577            298768   
10586                              195749            160563   
17751                                 504               391   
18237                                  84                58   
17686                                 298               201   
17677                                 172               111   
17682                                 140                84   
17738                                1781              1057   
17666                                 106                62   
18234                                 118                69   
17661                                  72                38   
17672                                 225                98   

                      Total of each category  % of Malware detected  
Census_OSBuildNumber                                                 
17133                                    123              56.910569  
17134                                3665428              52.683752  
17733                                    436              49.311927  
16299                                2281879              49.122762  
10240                                 252706              48.966784  
18242                                    115              48.695652  
17755                                    592              48.141892  
17746                                   1097              47.766636  
15063                                 730951              47.522337  
17763                                    922              46.746204  
17741                                    740              46.486486  
17754                                    927              46.386192  
17735                                    859              46.100116  
17692                                   2885              45.996534  
17758                                   1503              45.775116  
17744                                   2139              45.582048  
17760                                    525              45.523810  
14393                                 657345              45.450715  
10586                                 356312              45.062473  
17751                                    895              43.687151  
18237                                    142              40.845070  
17686                                    499              40.280561  
17677                                    283              39.222615  
17682                                    224              37.500000  
17738                                   2838              37.244538  
17666                                    168              36.904762  
18234                                    187              36.898396  
17661                                    110              34.545455  
17672                                    323              30.340557  


 OsBuild Vs HasDetections


         No Malware detected  Malware Detected  Total of each category  \
OsBuild                                                                  
7600                     113               128                     241   
17134                1700017           1884117                 3584134   
9600                   50375             50757                  101132   
17763                    329               326                     655   
17733                    225               221                     446   
16299                1180905           1149765                 2330670   
10240                 128004            123102                  251106   
17758                    670               641                    1311   
17760                    259               245                     504   
15063                 376748            341319                  718067   
17746                    553               499                    1052   
17744                   1083               977                    2060   
17755                    341               303                     644   
17741                    392               347                     739   
17735                    466               410                     876   
14393                 339664            282453                  622117   
17754                    482               395                     877   
17692                   1633              1328                    2961   
10586                 171161            138828                  309989   
17738                   1273              1031                    2304   
17751                    501               401                     902   
18237                     75                60                     135   
7601                   15354             12205                   27559   
17686                    304               203                     507   
17677                    178               114                     292   
18234                    119                76                     195   
17666                    109                66                     175   
17682                    154                87                     241   
17661                     75                39                     114   
17672                    234               100                     334   
17713                    784               122                     906   

         % of Malware detected  
OsBuild                         
7600                 53.112033  
17134                52.568263  
9600                 50.188862  
17763                49.770992  
17733                49.551570  
16299                49.331952  
10240                49.023918  
17758                48.893974  
17760                48.611111  
15063                47.533030  
17746                47.433460  
17744                47.427184  
17755                47.049689  
17741                46.955345  
17735                46.803653  
14393                45.401910  
17754                45.039909  
17692                44.849713  
10586                44.784815  
17738                44.748264  
17751                44.456763  
18237                44.444444  
7601                 44.286803  
17686                40.039448  
17677                39.041096  
18234                38.974359  
17666                37.714286  
17682                36.099585  
17661                34.210526  
17672                29.940120  
17713                13.465784  


 OsSuite Vs HasDetections


         No Malware detected  Malware Detected  Total of each category  \
OsSuite                                                                  
256                  1408903           1485487                 2894390   
768                  2563978           2505312                 5069290   
784                      100                77                     177   

         % of Malware detected  
OsSuite                         
256                  51.322973  
768                  49.421359  
784                  43.502825  

Additional Category Columns Summary

  • OrganizationIdentifiers 2.0,19.0,6.0,21.0,26.0 are less vulnerable and 50.0 is most vulnerable.
  • It looks mostly that with numbe of battery charges increases the system becomes less vulnerable to virus (Census_InternalBatteryNumberOfCharges).
  • Census_ProcessorManufactureridentifier has very less 3.0 and 10.0 versions but their attack percentage are less than 25% mainly 10.0 with just 1% malware detected.
  • _FirmwareManufacturers 1038,182,1040,869,357 are most safest from Malware and manufacturers 658,687,301 and 1014 are most vulnerable of malware (Census_FirmwareManufacturererIdentifier).
  • Processor Core count 1 is most safest, followed by 2. Cores 12 and 16 are most vulnerable to malware.
  • Anti virus products 3.0,4.0 and 5.0 having an average percentage of 27% of malware which being leastand safe form malware. Versions 1.0 with 55% most vulnerable. It explains because it being first version and many hacks might have happened which inturn made Anti virus teams release next versions.)(AVProductsInstalled)
  • If 2 or 3 anti virus products are enabled then system is 70% safe from malware. (AVProductsEnabled).
  • RtpStateBitfield which represents Real time transport protocol which helps in streaming audio and video over internet and bitfield 8.0 is 76% vulnerable to virus and 5.0 is safest followed by 3.0.
  • Census_OSBuildNumber and OSBuild contains same information, almost. OSBuild 17713,17672,17661,17682 are most safest builds from malware. Alternatively, 7600 is most vulnerable.
In [384]:
#Delete new cat columns from num cols for further analysis
rem_num_cols = list(set(num_cols)-set(new_cat_columns))
rem_num_cols
Out[384]:
['Census_InternalPrimaryDiagonalDisplaySizeInInches',
 'AVProductStatesIdentifier',
 'Census_OEMNameIdentifier',
 'Census_InternalPrimaryDisplayResolutionHorizontal',
 'IeVerIdentifier',
 'Census_ProcessorModelIdentifier',
 'LocaleEnglishNameIdentifier',
 'CountryIdentifier',
 'GeoNameIdentifier',
 'Census_PrimaryDiskTotalCapacity',
 'Census_InternalPrimaryDisplayResolutionVertical',
 'Census_OEMModelIdentifier',
 'Census_SystemVolumeTotalCapacity',
 'Census_FirmwareVersionIdentifier',
 'Census_OSUILocaleIdentifier',
 'Census_OSInstallLanguageIdentifier',
 'Census_TotalPhysicalRAM']
In [422]:
finite_data = df[np.isfinite(df[col])==True]
temp_data=finite_data[rem_num_cols +['HasDetections']].sample(1000)
#temp_data = sns.load_dataset(temp_data)
snspair=sns.pairplot(data=temp_data,hue='HasDetections',height=15)
snspair.savefig('../Images/pairplot.png')
snspair
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:305: UserWarning: Dataset has 0 variance; skipping density estimate.
  warnings.warn(msg, UserWarning)
Out[422]:
<seaborn.axisgrid.PairGrid at 0x7fb622f13dc0>
In [414]:
print("Bi variant numeric columns")
for i,col in enumerate(rem_num_cols):
    sns.scatterplot(x=col,y=rem_num_cols[i-1],hue="HasDetections",data=df.sample(10000))
    plt.show()
print("Bi variant numeric columns")
Bi variant numeric columns
Bi variant numeric columns

Insights

  • Census_OSUILocaleIdentifier and Census_OSInstallLanguageIdentifier are strongly positively related.
In [387]:
for col in rem_num_cols:
    finite_data = df[np.isfinite(df[col])==True]
    fig,ax = plt.subplots(1,2,figsize=(15,10))
    data_f= finite_data.sample(10000)
    sns.violinplot(y=col,x='HasDetections',data=data_f,ax=ax[0])
    ax[1]=sns.histplot(data_f,x=col,hue='HasDetections',bins=10)
    ax[0].set_title(col)
    plt.show()
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
In [ ]:
 
In [327]:
#Spider charts
spider_data_detected=df[df['HasDetections']==1][num_cols].describe().transpose()
spider_data_detected=spider_data_detected.drop(columns="count")
spider_data_detected=spider_data_detected[np.isfinite(spider_data_detected)]
spider_data_detected= spider_data_detected[spider_data_detected[spider_data_detected.columns] > 0].dropna()
spider_data_detected=spider_data_detected.transpose().drop(columns="Census_OEMModelIdentifier")
spider_data_detected
Out[327]:
Census_FirmwareVersionIdentifier Census_TotalPhysicalRAM AVProductStatesIdentifier Census_OSUILocaleIdentifier Census_OSBuildNumber CountryIdentifier OsBuild OsSuite
mean 32927.507812 6382.580566 49319.066406 59.988654 16038.022979 108.575253 15957.862066 577.42319
std 21169.441406 4789.770020 12116.578125 44.856033 1769.188613 63.071224 1917.595430 247.49946
min 73.000000 511.000000 3.000000 2.000000 9200.000000 1.000000 7600.000000 256.00000
25% 13115.000000 4096.000000 53447.000000 31.000000 16299.000000 51.000000 16299.000000 256.00000
50% 33070.000000 4096.000000 53447.000000 34.000000 16299.000000 97.000000 16299.000000 768.00000
75% 52369.000000 8192.000000 53447.000000 83.000000 17134.000000 162.000000 17134.000000 768.00000
max 72105.000000 524288.000000 70498.000000 162.000000 18242.000000 222.000000 18242.000000 784.00000
In [329]:
spider_data_not_detected=df[df['HasDetections']==0][num_cols].describe().transpose()
spider_data_not_detected=spider_data_not_detected.drop(columns="count")
spider_data_not_detected=spider_data_not_detected[np.isfinite(spider_data_not_detected)]
spider_data_not_detected=spider_data_not_detected[spider_data_not_detected[spider_data_not_detected.columns]> 0].dropna()
spider_data_not_detected=spider_data_not_detected.transpose().drop(columns="Census_OEMModelIdentifier")
spider_data_not_detected
Out[329]:
Census_FirmwareVersionIdentifier Census_TotalPhysicalRAM AVProductStatesIdentifier Census_OSUILocaleIdentifier Census_OSBuildNumber CountryIdentifier OsBuild OsSuite
mean 32969.343750 5.771309e+03 45980.531250 59.885807 15897.841770 107.721439 15819.858876 586.434386
std 21210.908203 4.654353e+03 16165.286133 44.488819 1850.482041 62.787966 1995.554150 244.940299
min 10.000000 4.000000e+02 6.000000 1.000000 7600.000000 1.000000 7600.000000 256.000000
25% 13153.000000 4.096000e+03 46669.000000 31.000000 15063.000000 51.000000 15063.000000 256.000000
50% 33066.000000 4.096000e+03 53447.000000 34.000000 16299.000000 97.000000 16299.000000 768.000000
75% 52458.000000 8.192000e+03 53447.000000 83.000000 17134.000000 160.000000 17134.000000 768.000000
max 72102.000000 1.572864e+06 70496.000000 162.000000 18244.000000 222.000000 18244.000000 784.000000
In [351]:
array_not_detected=[]
array_detected=[]
for index in spider_data_detected.index:
    array_detected.append(np.array(spider_data_detected.loc[index].values).tolist()[0:6])
for index in spider_data_not_detected.index:
    array_not_detected.append(np.array(spider_data_not_detected.loc[index].values).tolist()[0:6])
#array_detected,array_not_detected
In [352]:
from matplotlib.patches import Circle, RegularPolygon
from matplotlib.path import Path
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
from matplotlib.spines import Spine
from matplotlib.transforms import Affine2D


def radar_factory(num_vars, frame='circle'):
    """Create a radar chart with `num_vars` axes.

    This function creates a RadarAxes projection and registers it.

    Parameters
    ----------
    num_vars : int
        Number of variables for radar chart.
    frame : {'circle' | 'polygon'}
        Shape of frame surrounding axes.

    """
    # calculate evenly-spaced axis angles
    theta = np.linspace(0, 2*np.pi, num_vars, endpoint=False)

    class RadarAxes(PolarAxes):

        name = 'radar'
        # use 1 line segment to connect specified points
        RESOLUTION = 1

        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
            # rotate plot such that the first axis is at the top
            self.set_theta_zero_location('N')

        def fill(self, *args, closed=True, **kwargs):
            """Override fill so that line is closed by default"""
            return super().fill(closed=closed, *args, **kwargs)

        def plot(self, *args, **kwargs):
            """Override plot so that line is closed by default"""
            lines = super().plot(*args, **kwargs)
            for line in lines:
                self._close_line(line)

        def _close_line(self, line):
            x, y = line.get_data()
            # FIXME: markers at x[0], y[0] get doubled-up
            if x[0] != x[-1]:
                x = np.concatenate((x, [x[0]]))
                y = np.concatenate((y, [y[0]]))
                line.set_data(x, y)

        def set_varlabels(self, labels):
            self.set_thetagrids(np.degrees(theta), labels)

        def _gen_axes_patch(self):
            # The Axes patch must be centered at (0.5, 0.5) and of radius 0.5
            # in axes coordinates.
            if frame == 'circle':
                return Circle((0.5, 0.5), 0.5)
            elif frame == 'polygon':
                return RegularPolygon((0.5, 0.5), num_vars,
                                      radius=.5, edgecolor="k")
            else:
                raise ValueError("unknown value for 'frame': %s" % frame)

        def _gen_axes_spines(self):
            if frame == 'circle':
                return super()._gen_axes_spines()
            elif frame == 'polygon':
                # spine_type must be 'left'/'right'/'top'/'bottom'/'circle'.
                spine = Spine(axes=self,
                              spine_type='circle',
                              path=Path.unit_regular_polygon(num_vars))
                # unit_regular_polygon gives a polygon of radius 1 centered at
                # (0, 0) but we want a polygon of radius 0.5 centered at (0.5,
                # 0.5) in axes coordinates.
                spine.set_transform(Affine2D().scale(.5).translate(.5, .5)
                                    + self.transAxes)
                return {'polar': spine}
            else:
                raise ValueError("unknown value for 'frame': %s" % frame)

    register_projection(RadarAxes)
    return theta


def example_data():
    # The following data is from the Denver Aerosol Sources and Health study.
    # See doi:10.1016/j.atmosenv.2008.12.017
    #
    # The data are pollution source profile estimates for five modeled
    # pollution sources (e.g., cars, wood-burning, etc) that emit 7-9 chemical
    # species. The radar charts are experimented with here to see if we can
    # nicely visualize how the modeled source profiles change across four
    # scenarios:
    #  1) No gas-phase species present, just seven particulate counts on
    #     Sulfate
    #     Nitrate
    #     Elemental Carbon (EC)
    #     Organic Carbon fraction 1 (OC)
    #     Organic Carbon fraction 2 (OC2)
    #     Organic Carbon fraction 3 (OC3)
    #     Pyrolized Organic Carbon (OP)
    #  2)Inclusion of gas-phase specie carbon monoxide (CO)
    #  3)Inclusion of gas-phase specie ozone (O3).
    #  4)Inclusion of both gas-phase species is present...
#     data = [
#         ['Sulfate', 'Nitrate', 'EC', 'OC1', 'OC2', 'OC3', 'OP', 'CO', 'O3'],
#         ('Basecase', [
#             [0.88, 0.01, 0.03, 0.03, 0.00, 0.06, 0.01, 0.00, 0.00],
#             [0.07, 0.95, 0.04, 0.05, 0.00, 0.02, 0.01, 0.00, 0.00],
#             [0.01, 0.02, 0.85, 0.19, 0.05, 0.10, 0.00, 0.00, 0.00],
#             [0.02, 0.01, 0.07, 0.01, 0.21, 0.12, 0.98, 0.00, 0.00],
#             [0.01, 0.01, 0.02, 0.71, 0.74, 0.70, 0.00, 0.00, 0.00]]),
#         ('With CO', [
#             [0.88, 0.02, 0.02, 0.02, 0.00, 0.05, 0.00, 0.05, 0.00],
#             [0.08, 0.94, 0.04, 0.02, 0.00, 0.01, 0.12, 0.04, 0.00],
#             [0.01, 0.01, 0.79, 0.10, 0.00, 0.05, 0.00, 0.31, 0.00],
#             [0.00, 0.02, 0.03, 0.38, 0.31, 0.31, 0.00, 0.59, 0.00],
#             [0.02, 0.02, 0.11, 0.47, 0.69, 0.58, 0.88, 0.00, 0.00]]),
#         ('With O3', [
#             [0.89, 0.01, 0.07, 0.00, 0.00, 0.05, 0.00, 0.00, 0.03],
#             [0.07, 0.95, 0.05, 0.04, 0.00, 0.02, 0.12, 0.00, 0.00],
#             [0.01, 0.02, 0.86, 0.27, 0.16, 0.19, 0.00, 0.00, 0.00],
#             [0.01, 0.03, 0.00, 0.32, 0.29, 0.27, 0.00, 0.00, 0.95],
#             [0.02, 0.00, 0.03, 0.37, 0.56, 0.47, 0.87, 0.00, 0.00]]),
#         ('CO & O3', [
#             [0.87, 0.01, 0.08, 0.00, 0.00, 0.04, 0.00, 0.00, 0.01],
#             [0.09, 0.95, 0.02, 0.03, 0.00, 0.01, 0.13, 0.06, 0.00],
#             [0.01, 0.02, 0.71, 0.24, 0.13, 0.16, 0.00, 0.50, 0.00],
#             [0.01, 0.03, 0.00, 0.28, 0.24, 0.23, 0.00, 0.44, 0.88],
#             [0.02, 0.00, 0.18, 0.45, 0.64, 0.55, 0.86, 0.00, 0.16]])
#     ]
    data = [spider_data_detected.columns[0:6],("No Malware Detected",
                           array_not_detected[0:6]),
                            ("Malware Detected",
                           array_detected[0:6])]
    return data


if __name__ == '__main__':
    N = 6
    theta = radar_factory(N, frame='polygon')

    data = example_data()
    spoke_labels = data.pop(0)

    fig, axes = plt.subplots(figsize=(20, 15), nrows=2, ncols=1,
                             subplot_kw=dict(projection='radar'))
    fig.subplots_adjust(wspace=0.25, hspace=0.20, top=0.85, bottom=0.05)

    colors = ['b', 'r', 'g', 'm', 'y','c','k']
    # Plot the four cases from the example data on separate axes
    for ax, (title, case_data) in zip(axes.flat, data):
        #ax.set_rgrids([0.2, 0.4, 0.6, 0.8])
        ax.set_title(title, weight='bold', size='medium', position=(0.5, 1.1),
                     horizontalalignment='center', verticalalignment='center')
        for d, color in zip(case_data, colors):
            ax.plot(theta, d, color=color)
            ax.fill(theta, d, facecolor=color, alpha=0.25)
        ax.set_varlabels(spoke_labels)

    # add legend relative to top-left plot
    ax = axes[0]
    labels = (spider_data_detected.index)
    legend = ax.legend(labels, loc=(0.9, .95),
                       labelspacing=0.1, fontsize='small')

    fig.text(0.5, 0.965, '5-Factor Solution Profiles Across Four Scenarios',
             horizontalalignment='center', color='black', weight='bold',
             size='large')

    plt.show()

There is no much difference in statistics for Malware detected and not detected spider graphs

In [294]:
#get in mean, stds and CI for features.
#complete the dataset visualisation.
In [389]:
import scipy.stats


def confidence_interval(data, confidence=0.95):
    a = 1.0 * np.array(data)
    n = len(a)
    m, se = np.mean(a), scipy.stats.sem(a)
    h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)
    return np.round(m-h,2), np.round(m+h,2)
In [391]:
print("Summary Statistics of Numeric columns\n")
for col in rem_num_cols:
    if(np.isfinite(np.nanmean(df[col]))):
        print("\n\t\t\t"+col+"\n")
        print("Malware Detected\t\t\t\t\t\t Malware not detected")
        print("Mean = {0}\t\t\t\t\t\t Mean = {1}".format(np.nanmean(df[df.HasDetections==1][col]),np.nanmean(df[df.HasDetections==0][col])))
        print("Standard Deviation = {0}\t\t\t\tStandard Deviation = {1}".format(np.std(df[df.HasDetections==1][col]),np.std(df[df.HasDetections==0][col])))
        print("95% CI : {0}\t\t\t\t 95% CI :{1}".format(confidence_interval(df[df.HasDetections==1][col].sample(10000)),confidence_interval(df[df.HasDetections==0][col].sample(10000))))
Summary Statistics of Numeric columns


			AVProductStatesIdentifier

Malware Detected						 Malware not detected
Mean = 49439.265625						 Mean = 45910.55078125
Standard Deviation = 12116.576171875				Standard Deviation = 16165.2841796875
95% CI : (49327.81, 49787.5)				 95% CI :(45699.61, 46328.76)

			LocaleEnglishNameIdentifier

Malware Detected						 Malware not detected
Mean = 27.354231914438856						 Mean = 28.701495929630674
Standard Deviation = 65.15730487926942				Standard Deviation = 65.78355485655821
95% CI : (26.1, 28.66)				 95% CI :(27.83, 30.4)

			CountryIdentifier

Malware Detected						 Malware not detected
Mean = 108.57525300829867						 Mean = 107.72143939273809
Standard Deviation = 63.07121571772393				Standard Deviation = 62.787957896956556
95% CI : (107.88, 110.36)				 95% CI :(107.39, 109.87)

			Census_PrimaryDiskTotalCapacity

Malware Detected						 Malware not detected
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Mean = 4214474.0						 Mean = 2554755.5
Standard Deviation = 5229808128.0				Standard Deviation = 4094068224.0
95% CI : (528550.07, 542568.43)				 95% CI :(494168.74, 507683.45)

			Census_OEMModelIdentifier

Malware Detected						 Malware not detected
Mean = 238321.6875						 Mean = 238477.03125
Standard Deviation = 71578.234375				Standard Deviation = 71447.65625
95% CI : (237192.83, 240001.51)				 95% CI :(236836.72, 239619.72)

			Census_SystemVolumeTotalCapacity

Malware Detected						 Malware not detected
Mean = 387324.75						 Mean = 378853.53125
Standard Deviation = 323466.3125				Standard Deviation = 323734.59375
95% CI : (381812.03, 394518.35)				 95% CI :(373557.68, 386373.64)

			Census_FirmwareVersionIdentifier

Malware Detected						 Malware not detected
Mean = 33020.8125						 Mean = 33047.28515625
Standard Deviation = 21169.439453125				Standard Deviation = 21210.904296875
95% CI : (32571.34, 33404.6)				 95% CI :(32887.1, 33717.88)

			Census_OSUILocaleIdentifier

Malware Detected						 Malware not detected
Mean = 59.988654122890786						 Mean = 59.885806652485876
Standard Deviation = 44.856027562948306				Standard Deviation = 44.48881351514158
95% CI : (59.23, 60.98)				 95% CI :(59.31, 61.06)

			Census_TotalPhysicalRAM

Malware Detected						 Malware not detected
/Users/sankar/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Mean = 6382.37353515625						 Mean = 5771.19140625
Standard Deviation = 4789.76953125				Standard Deviation = 4654.35205078125
95% CI : (6250.41, 6431.19)				 95% CI :(5714.32, 5894.71)

Summary Of the Notebook

Booelan Columns

  • When SMode is on (1), the chances of getting infected by a malware is reduced by 35 % (50 to 15%)
  • When Census_IsFlightsDiabled then, only 5% is infected with malware. But total number of systems that had the feature enabled is only 80 out of 8 million, it shows this feature is known to very few and used only by a handful of people. The reason for less malware attack % might be because the feature may not be known to majority of the people.
  • Enabling Census_IsAlwaysOnAlwaysConnectedCapable reduces chance of getting affected by a malware by 13% (50 to 37%)
  • When system is not protected (IsProtected) the chance of being affected by malware is less by 13% when coompared to those systems that have protected switched. Ironically, this was the case but the scenario might be something like If a system is not equipped with malware protection software which detects some virus, one might not even be aware of all the virus that's present in one's system easily when compared to those which have malware protection software that checks for viruses and report once found.
  • Machines running on virtual devices(Census_IsVirtualDevice) have a percentage of only 19% of them being infected. ### Catergory Columns
  • In feature Smart Screen if the status is existsnotset, the chance of a system being infected by a malware is as high as 81%. Followed by warn status with 58%.
  • System in power platform with role names Slate, EnterpriseServer and AppliancePC have less chance of vulnerability.
  • OSBuildRevision versions 2396,17914,17918,17946 and 17889 are most vulnerable for a malware affect and versions 2273,2312,2248,1378 are safest categories from malware.
  • Engine Version 1.1.14901 has considerable amount of people installed it which a malware detection percent of 30.
  • App versions 4.14.17639.18041 and 4.16.17656.18052 are safest categories.
  • Devices (Census_MDC2FormFactor) like Tablets,Servers and Detachables are less prone to malware when compared to Desktops,Notebooks and Convertibles.
  • OrganizationIdentifiers 2.0,19.0,6.0,21.0,26.0 are less vulnerable and 50.0 is most vulnerable.
  • It looks mostly that with numbe of battery charges increases the system becomes less vulnerable to virus (Census_InternalBatteryNumberOfCharges).
  • Census_ProcessorManufactureridentifier has very less 3.0 and 10.0 versions but their attack percentage are less than 25% mainly 10.0 with just 1% malware detected.
  • _FirmwareManufacturers 1038,182,1040,869,357 are most safest from Malware and manufacturers 658,687,301 and 1014 are most vulnerable of malware (Census_FirmwareManufacturererIdentifier).
  • Processor Core count 1 is most safest, followed by 2. Cores 12 and 16 are most vulnerable to malware.
  • Anti virus products 3.0,4.0 and 5.0 having an average percentage of 27% of malware which being leastand safe form malware. Versions 1.0 with 55% most vulnerable. It explains because it being first version and many hacks might have happened which inturn made Anti virus teams release next versions.)(AVProductsInstalled)
  • If 2 or 3 anti virus products are enabled then system is 70% safe from malware. (AVProductsEnabled).
  • RtpStateBitfield which represents Real time transport protocol which helps in streaming audio and video over internet and bitfield 8.0 is 76% vulnerable to virus and 5.0 is safest followed by 3.0.
  • Census_OSBuildNumber and OSBuild contains same information, almost. OSBuild 17713,17672,17661,17682 are most safest builds from malware. Alternatively, 7600 is most vulnerable.
In [ ]:
 
In [ ]:
 
In [ ]: